Regression for Linguists
  • D. Palleschi
  1. Report 2
  2. 14  Report 2
  • Overview
    • Course overview
    • Resources and Set-up
  • Day 1: Simple linear regression
    • 1  Understanding straight lines
    • 2  Simple linear regression
    • 3  Continuous predictors
  • Day 2: Multiple regression
    • 4  Multiple Regression
    • 5  Categorical predictors
  • Day 3: Logistic regression
    • 6  Logistic regression
  • Report 1
    • 7  Report 1
  • Mixed models
    • 8  Independence
    • 9  Random intercepts
    • 10  Random slopes
    • 11  Shrinkage and Partial Pooling
    • 12  Model selection
    • 13  Model selection: Example
  • Report 2
    • 14  Report 2
  • References

Table of contents

  • 14.1 Dataset
  • 14.2 Set-up
    • 14.2.1 Quarto YAML
    • 14.2.2 Packages
    • 14.2.3 Data
  • 14.3 Model set up
    • 14.3.1 Variable transformations
    • 14.3.2 Model selection
  • 14.4 Linear mixed model
    • 14.4.1 Fit a model
    • 14.4.2 Report results
  • 14.5 Generalised linear mixed model
    • 14.5.1 Fit a model
    • 14.5.2 Report results
  • 14.6 Interpretation
  • 14.7 Render

14  Report 2

(Generalised) linear mixed models

The goal of this report is to review and consolidate what we learned together in the second block of the course. You are not required to do anything that we have not already seen.

For students enrolled in this course in the Winter Semester 2023/24: The report is due March 29, 2024 at 11:59pm. Please submit your Quarto script, as well as a rendered copy in HTML and PDF to Moodle (under ‘Reports’).

14.1 Dataset

For this report you will continue using the data from Biondo et al. (2022), an eye-tracking reading study on adverb-tense congruence effects on reading time measures. Participants’ eye movements were recorded as they read Spanish sentences where temporal adverbs and verb tense were either congruent or incongruent. For both sentence regions, the time reference was either past (e.g., yesterday, bought) or future (e.g., tomorrow, will buy). Example stimuli from this experiment are given in Table 14.1.

Table 14.1: Example stimuli
sentence adverb verb gramm
A la salida del trabajo, **ayer** las chicas **compraron** pan en la tienda.<br> *After leaving work* **yesterday** *the girls* **bought** *bread at the shop* past past gramm
A la salida del trabajo, **ayer** las chicas **\*comprarán** pan en la tienda.<br> *After leaving work* **yesterday** *the girls* **\*will buy** *bread at the shop* past future ungramm
A la salida del trabajo, **mañana** las chicas **comprarán** pan en la tienda.<br> *After leaving work* **tomorrow** *the girls* **will buy** *bread at the shop* future future gramm
A la salida del trabajo, **mañana** las chicas **\*compraron** pan en la tienda.<br> *After leaving work* **tomorrow** *the girls* **\*bought** *bread at the shop* future past ungramm

You will be fitting models to different eye-tracking reading measures from this experiment, with the predictors adverb time and grammaticality.

14.2 Set-up

Make sure you begin with a clear working environment. To achieve this, you can go to Session > Restart R. Your Environment should have no objects in it, and you should not have any packages loaded.

14.2.1 Quarto YAML

Make sure your YAML looks something like this:

---
title: "Report 2"
name: "My Name"
format:
  html: default
  pdf: default
toc: true
number-sections: true
---
Render often

I suggest you render your document frequently, e.g., after every substantial code chunk/task achieved. This will ensure earlier detection of broken code and makes it easier to fix problems. Do this for both HTML and PDF.

14.2.2 Packages

Load the following packages, however you prefer (i.e., you don’t have to use pacman::p_load()):

  • tidyverse
  • janitor
  • here
  • broom.mixed
  • lattice
  • lme4
  • lmerTest

Describe what each of the following packages is used for (in our experience, they have many more useful functions than we’ve tried).

  1. broom.mixed:
  2. lattice:
  3. lme4:
  4. lmerTest:

14.2.3 Data

Load in the Biondo et al. (2022) data by running the following code chunk.

df_biondo <-
  read_csv(here("data", "Biondo.Soilemezidi.Mancini_dataset_ET.csv"),
           locale = locale(encoding = "Latin1") ## for special characters in Spanish
           ) |> 
  clean_names() |> 
  mutate(gramm = ifelse(gramm == "0", "ungramm", "gramm")) |> 
  mutate_if(is.character,as_factor) |> # all character variables as factors
  filter(adv_type == "Deic") |> 
  droplevels() |> 
  mutate(
    roi_length = str_length(label)
  ) |> 
  relocate(roi_length, .after = label)

The last few lines add a new variable (roi_length) that contains region length (in letters). We will use this as a covariate in one of our models.

14.3 Model set up

You will be asked to run two models, one linear mixed model (lmer() from the lme4 or lmerTest package) and one genearlised (logistic) linear mixed model (glmer(family = "binomial") from the lme4 package).

14.3.1 Variable transformations

For each model, consider whether you need to implement the following steps:

  • centre (sum contrast code) categorical predictors
  • standardize continuous predictors (e.g., using the scale() function)
  • log-transform continuous dependent variables if skewed
  • model selection: begin with a maximal model
    • simplify in case of nonconvergence or singular fit

14.3.2 Model selection

For each model, start with a “maximal” model justified by the design. If you encounter convergence issues, begin by first implementing “unintrusive” remedies. If you still have convergence issues (as indicated by warning messages and/or e.g., inspecting the variance-covariance matrix), reduce the random effects structure as you see fit. Be sure to document and justify your decisions step-by-step. N.B., the equivalent of lmerControl argument (for lmer() models) is glmerControl for glmer() models.

If you choose to use the lme4::allFit() function, beware that it can take a long time to run, especially on ‘maximal’ models. I suggest you (i) save the output as an object (e.g., allFit_model1 <- allFit(model1)) and (ii) plan another task that doesn’t involve running code when you run this function.

I am not expecting any particular model/random effects structure that is correct, but am looking for explanations on how you made decisions regarding what to remove or keep in your model.

14.4 Linear mixed model

Fit a linear mixed model to total reading times (tt) at the adverb region (roi == 2). Your fixed effects are adverb time reference (adv_t), grammaticality (gramm), their interaction, and (standardized) region length in characters as a covariate without any interaction. Include by-participant and -item random effects.

14.4.1 Fit a model

Start by defining your most maximal model justified by your design, and simplify accordingly. Remember to not delete the code for nonconverging models, instead set the code chunk to not run when you render your document, as in the code chunk below (#| eval: false).

```{r}
#| eval: false

fit_some_maximal_model <- 
  lmer(dependent_variable ~ predictor1*predictor2 + covariate +
         (1 + predictor1*predictor2|participant) +
         data = my_data,
       subset = some_factor == "some_level")
# informative comment, e.g., "didn't converge"
```

14.4.2 Report results

Once you’ve landed on a final model that converges, inspect the fixed and random effects (some useful functions we’ve already seen: summary(), broom.mixed::tidy(), fixef(), ranef(), coef(), lattice::dotplot()).

14.5 Generalised linear mixed model

We didn’t cover how to implement logistic mixed regression, however the relationship between lm() and glm() is the same in mixed models (lmer() and glmer()).

14.5.1 Fit a model

Fit a generalised linear mixed model (glmer() from the lme4 package, lmerTest does not have this function) to the regressions in (ri) to the adverb region (roi == 2). Your fixed effects are adverb time reference (adv_t), grammaticality (gramm), and their interaction. Remember to use eval: false in your code chunk options to stop Quarto from running all your non-final models when rendering.

14.5.2 Report results

Once you’ve landed on a final model that converges, inspect the fixed and random effects (some useful functions we’ve already seen: summary(), broom.mixed::tidy(), fixef(), ranef(), coef(), lattice::dotplot())

Recall that our coefficient estimates are in log odds. The interpretation of your coefficient estimates (fixed effects) is identical to that in genearlised linear models (i.e., without random effects).

14.6 Interpretation

Write a short report of the findings from the two models. Produce a table and plot like in the example above to supplement your report.

14.7 Render

Render your Quarto finished script. Upload the .qmd, .pdf, and .html files to Moodle. N.B., you need to have tinytex installed to be able to render PDFs.

Biondo, N., Soilemezidi, M., & Mancini, S. (2022). Yesterday is history, tomorrow is a mystery: An eye-tracking investigation of the processing of past and future time reference during sentence reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 48(7), 1001–1018. https://doi.org/10.1037/xlm0001053
13  Model selection: Example
References